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Researchers  in  mathematics  education  have  studied  errors  for  most  of 


this  century.  For  example,  Uhl  studied  errors  in  1917  (Uhl,  1917),  with 
subsequent  work  by  researchers  such  as  Buswell  and  Judd  (1925), 
Brueckner  (1930)  and  Buckingham  (1933).  This  interest  continues  to  the 
present  (e.g.,  Bunderson  &  Olsen,  1963).  For  a  list  of  selected  refer¬ 
ences,  see  Ashlock  (1982). 

In  addition  to  studying  mathematical  errors  for  their  own  sakes,  some 
researchers  have  stressed  the  need  to  alert  teachers  to  them  for  the 
sake  of  improving  Instruction  (e.g.,  Cox,  1975;  Fowler,  1980;  Swan, 
1983),  since  some  students  are  reported  to  have  confidence  in  their 
faulty  procedures  (Feghall,  1976;  MacKay,  1975).  According  to  Uest 
(1971),  "There  is  hardly  a  skill  in  the  teachers  repertoire  that  is  more 
Important  than  the  ability  to  Identify  pupil  errors  and  to  prescribe 
appropriate  remedial  procedures,"  and  errors  may  even  be  "springboards" 
for  students  to  understand  mathematics  (Borasi,  1986).  Further,  as 
Brown  and  Burton  (1978)  note,  ignoring  or  misinterpreting  students' 
errors  may  be  detrimental  to  students'  motivation: 

When  a  student's  bug  (which  may  only  manifest  Itself  occasion¬ 
ally)  is  not  recognised  by  the  teacher,  the  teacher  explains 
the  errant  behaviour  as  carelessness,  lazlnesB,  or  worse, 
thereby  often  mistakenly  lowering  his  opinions  of  the 
student's  capbllitles. . .  From  the  student's  viewpoint,  the 
situation  is  much  worse.  He  is  following  what  he  believes  to 
be  the  correct  algorithm  and,  seemingly  at  random,  gets  marked 
wrong  (p.  285,  italics  In  the  original). 

Training  teachers  to  diagnose  errors.  Such  concern  has  lead  to  efforts 
to  train  teachers  to  diagnose  errors.  Brown  and  Burton  (1978)  used  a 
computer  to  tutor  diagnosis  of  errors  in  subtraction  successfully. 


Since  then,  at  least  two  further  attempts  at  training  teachers  to  diag¬ 
nose  bugs  using  computers  have  been  made,  one  for  addition  and  aubtrac- 
tlon  (De  Corte,  Verschaffel  £  Schrooten,  1986)  and  one  for  algebra 
(Schneider,  Kelly,  Blando,  Martinak,  Sleeman  &  Snow,  1986).  [For  an 
analogous  study  not  using  a  computer  tutor  see  Dodd,  Jones  and  Lamb, 
(1975). ] 

De  Coite  et  al.,  (1986)  found  that  students  who  worked  with  their  com¬ 
puter  program  (based  on  VanLehn's  "Buggy  Game")  were  superior  to  those 
in  a  control  group  on  the  ability  to  hypothesize  a  particular  bug,  and 
verify  it  by  predicting  the  wrong  answer  that  would  be  obtained  'o.  a 
set  of  tasks  if  that  bug  were  to  be  used. 

In  a  pilot  study,  Schneider  et  al.,  (1986)  found  that  teachers  were 
better  at  diagnosing  algebra  errors  having  worked  with  TP1X1E,  part  of  a 
larger  Intelligent  tutoring  system  (Sleeman,  1986).  Although  the  teach¬ 
ers  in  the  study  enjoyed  working  with  TPIXIE,  their  major  criticism  was 
that  it  did  not  present  challenging  tasks  soon  enough;  thl6  criticism 
that  has  since  been  addressed.  This  paper  reports  on  a  follow-on  study, 
the  purpose  of  which  is  to  test  the  effectiveness  of  the  revised  TPIXIE, 
and  auggest  further  Improvements  to  it. 

Transfer  of  Training.  A  common  concern  for  training  is  how  well  it 
transfers  to  related  and  previously  unencountered  tasks.  De  Corte  et 
al.,  (1986)  did  not  find  transfer  of  training  with  their  program.  The 
results  on  transfer  from  the  TPIXIE  pilot  study  (Schneider,  et  al., 
1986)  were  encouraging,  but  not  definitive.  Therefore,  transfer  of 
training  received  further  attention  in  this  study. 


Method 

Subjects.  Thirty-six  elementary-school  first-year  teacher-trainees  from 
a  Scottish  college  of  education  served  as  subjects.  Each  student  had 
coapleted  a  secondary-school  course,  and  would  have  had  training  in 
mathematics  Including  algebra.  (Ideally,  we  would  have  liked  to  use 
secondary  school  teacher-trainees,  but  such  students  in  the  College  had 
already  been  trained  in  diagnosis  of  errors.)  Students  were  paid  a  nomi¬ 
nal  fee  for  their  participation. 

Materials.  Two  computer  programs  were  used: 

TRIXIE .  TPIX1E  (Sleeman,  1986)  is  designed  to  help  the  user  diagnose  a 
common  bug  between  a  set  of  equations  (see  Figure  A) ,  The  user  is  shown 
a  set  of  three  task-student-answer  pairs;  or  task-answer  pairs  for 
short,  from  which  it  is  hoped  that  the  user  will  determine  the  mal-rule 
the  particular  student  is  making.  To  test  this,  the  user  is  presented 
with  three  further  tasks  to  which  they  must  respond  by  giving  the 
response  they  believe  corresponds  to  Che  student's  buggy  rule.  If  the 
user  diagnoses  the  bug  correctly,  a  new  set  of  equations  is  presented. 
Otherwise,  the  target  set  of  equations  for  that  task  level  is  reshown. 
If  the  user  is  unable  to  diagnose  the  common  bug  in  the  target  equa¬ 
tions,  a  facility  exists  to  show  the  pupil's  faulty  working  (see  Figure 
B).  Following  such  feedback,  TPIXIE  proceeds  to  the  next  task  level. 
As  the  user  progresses  through  the  16  sets  of  equations,  the  bugs  gen¬ 
erally  become  more  complicated.  TPIXIE  records  each  response  made  by 
the  user. 

FPIXIE.  FPIXIE  presents  a  series  of  algebra  equations  one  at  a  time  and 
asks  the  user  to  solve  them  (see  Figure  C).  FPIXIE  comments  simply  on 
whether  the  answer  was  correct  or  not,  and  then  presents  a  new  icem.  In 
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general,  each  Item  la  more  difficult  than  the  prior  one.  By  using 
FP1XIE  we  were  able  to  control  for  time  spent  on  the  computer,  and  for 
the  domain  area  covered.  FPIX1E  Is  a  part  of  the  more  general  RP1XIE 
Intelligent  tutoring  system  (Sleeman,  1987). 

Pretest.  Subjects  saw  a  test  comprised  of  28  task-answer  pairs.  No 
Intermediate  steps  In  the  solution  were  shown;  this  allowed  the  Investi¬ 
gators  to  assess  diagnostic  ability  under  the  stlngent  conditions  of 
limited  information. 

The  items  were  arranged  in  sets:  Sets  1,  2,  3,  4,  and  6  had  five  Items 
each;  Set  5  had  three  Items.  Sets  1,  2,  4,  5  and  6  had  a  common  faulty 
procedure  (bug)  underlying  each  error  In  their  respective  sets,  whereas 
each  of  the  task-answer  pairs  in  Set  3  had  a  different  bug.  Set  3  was 
included  to  discourage  subjects  from  presuming  that  the  diagnosis  of  the 
first  task-answer  pair  held  for  all  others  in  the  set. 

Posttest. 

The  posttest  was  similar  to  the  pretest  In  format.  The  same  bugs  were 
used  to  generate  the  Items,  except  in  the  case  of  set  5,  In  which  the 
square  root  of  the  final  answer  was  taken.  The  square  root  was  Inadver¬ 
tently  omitted  from  the  Items  in  set  5  on  the  pretest,  and  so  the  bug 
underlying  set  5  In  the  posttest  was  therefore  more  complex  than  that 
for  set  5  on  the  pretest.  The  bugs  in  both  pre-  and  posttests  were 
based  on  previously  observed  students'  protocols. 

Unlike  the  other  sets,  sets  2  and  4  contained  bugs  not  seen  on  TPIXIE. 
These  sets  were  included  to  test  for  transfer  of  training. 
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Procedure 


A  pretest  was  administered  to  all  the  trainee-teachers  taking  part  In 
the  experiment;  the  group  was  randomly  assigned  to  one  of  the  two  condi¬ 
tions-  Over  a  period  of  seven  days  Immediately  following  the  pretest, 
they  worked  either  the  TP1X1E  or  FP1XXE  programs  for  a  single  period  of 
50  minutes.  Six  days  after  the  last  teacher-trainees  worked  with  the 
computer  the  entire  sample  was  given  the  posttest.  Teacher-trainees 
were  allowed  50  minutes  for  both  the  pretest  and  posttest. 


Results 


Two  teacher-trainees  from  the  treatment  condition  (TPIX1E)  were  absent 
from  the  pretest  and  did  not  take  part  In  the  experiment.  Their  absence 
was  unrelated  to  the  experimental  conditions.  A  further  student  was 
very  poor  at  algebra  and  apparently  did  not  understand  what  was  required 
of  her;  consequently,  her  scores  were  dropped  froa  the  analyses.  This 
left  16  students  in  the  treataent  condition,  and  17  in  the  control 
(FP1X1E)  condition. 


Diagnosing  error  patterns.  We  wished  to  see  how  well  the  teacher- 
trainees,  by  condition,  diagnosed  the  bugs  underlying  the  aets  of  items 
that  contain  a  single  coamon  bug  (Sets  1,  2,  4,  5  and  6).  Set  3  was 
omitted  fro-  both  these  analysis  because  there  is  no  bug  common  to  lt6 
five  items.  We  performed  the  following  analyses: 


Majority  Match.  Credit  is  allowed  for  items  in  a  set  only  if  the  major¬ 
ity  of  the  task-answer  pairs  are  matched  to  the  known  bug:  so  for  sets 
1,  2,  4  and  6  it  is  possible  to  score  0  or  between  3  and  5  (as  there  are 
5  items  in  these  sets).  Similarly,  for  set  5  which  contains  3  items, 
scores  can  be  0  or  in  the  range  2  to  3.  The  maximum  total  score  possi¬ 
ble  is  23.  Using  this  criterion,  the  TPIXIE  condition  outscored  the 
control  condition  at  a  statistically  significant  level  (TPIXIE  M  - 
13.38,  control  M  -  8.65,  it  (31^)  -  .2.26,  £  <  .031). 


Full  Match.  Credit  is  allowed  for  a  set  only  if  all  of  the  task-answer 
pairs  are  aatched  to  the  known  bug:  so  scores  could  range  from  0  to  5 
(there  being  5  sets  in  all).  Using  this  criterion,  the  TPIXIE  condition 
again  outscored  the  control  condition,  but  the  difference  was  not  sta¬ 
tistically  significant.  (TPIXIE  M  ■  2.25,  control  M  -  1.47;  it  (31^)  “ 


Pre-  To  Posttest  Gain.  Botn  groups  gained  significantly  from  pretest  to 
*•  sttest  on  both  match  scores  (see  Tables  1  and  2). 

Transfer  of  training.  To  test  for  transfer  of  training,  neither  the  bug 
In  set  2  nor  thst  in  set  4  was  shown  on  TPIXIE.  A  comparison  of  the 
combined  Majority  Match  scores  on  these  sets  showed  that  the  TP1X1E 
group  scored  significantly  higher  than  the  Control  group,  (TPIX1E  M  ■ 
6.75,  Control  M  -  2.65,  t  (31_)  -  1_.29,  £  <  0.009).  The  Full  Match 
analysis  showed  no  significant  differences,  however  (TP1X1E  M  ■  .75, 
FPIXIE  M  -  .41,  £  (31)  -  K39,  £  <  .^_7). 

Reliability  Check.  In  scoring  the  raw  data  it  was  sometimes  unclear  as 
to  whether  or  not  an  error  had  been  diagnosed  correctly.  Some  students 
provided  explanations  of  errors  which  could  have  been  Interpreted  as 
correct  diagnosis,  but  which  contained  evidence  leading  the  scorer  to 
doubt  this.  Eight  test  sessions  (four  from  the  prestest  and  four  from 
the  posttest,  each  with  two  from  the  TPIXIE  and  two  from  the  FPIXIE 
groups,  but  otherwise  selected  randomly)  were  rechecked  by  an  Indepen¬ 
dent  scorer  who  was  in  94. 2X  agreement  with  the  original  scorer.  This 
casts  some  doubt  upon  the  significance  figures  quoted  in  the  rest  of  the 
paper.  However,  there  was  least  grounds  for  doubt  when  a  trainee 
teacher  had  consistently  diagnosed  a  full  set  of  items.  Thus  the 
methods  by  which  the  analysis  of  the  raw  scores  was  performed,  using 
multiple  matches  within  sets  (either  majority  match  or  full  match) 
rather  than  item  by  item  should  have  minimised  any  sources  of  error  due 
to  Inconsistencies  in  the  scoring. 
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Discussion 


The  results  of  this  study  are  generally  encouraging  for  the  further 
development  TP1XIE.  If  the  majority  scoring  criterion  Is  used,  the  stu¬ 
dents  in  the  TPIXIE  condition  score  significantly  better  than  those  in 
the  control  group,  and  even  when  the  more  stringent  criterion,  a  full 
match,  is  applied,  Its  results  favour  TPIXIE.  (The  Type  I  error  rate 
associated  with  this  comparison  is  probably  acceptable  for  a  program 
under  developaent.)  These  results,  together  with  the  finding  that  both 
groups  improved  significantly  over  their  own  pretest  performances,  leads 
us  to  conclude  from  these  analyses  that  while  students  can  learn  diag¬ 
nosis  of  errors  without  the  aid  of  TPIXIE,  those  who  work  with  TPIXIE 
are  likely  to  be  aore  effective  diagnosticians. 

At  least  two  strong  qualifications  need  to  be  made  in  this  assessment  of 
TPIXIE,  namely,  a)  the  subjects  did  not  exactly  Batch  those  for  whom  the 
program  was  intended  (the  subjects  were  elementary  as  opposed  to  secon¬ 
dary  teacher-trainees),  and  b)  the  dependent  measure  was  not  as  sensi¬ 
tive  as  we  would  have  wished.  For  this  sample  only  sets  2  and  6  seem  to 
be  discriminating  between  the  groups;  set  1  appears  to  be  too  simple, 
and  sets  4  and  5  appear  to  be  too  difficult  (see  Table  3). 

Both  conditions  saw  the  bugs  for  sets  2  and  4  on  the  pretest.  These 
same  bugs  were  not  shown  on  TPIXIE.  Nevertheless,  teacher-trainees  who 
worked  with  TPIXIE  diagnosed  these  bugs  on  the  posttest  on  a  greater 
number  of  task-answer  pairs  than  those  in  the  control  condition.  In  the 
next  phase  of  the  developaent  of  TPIXIE,  we  plan  to  include  on  the  post¬ 
test  bugs  that  will  appear  on  neither  the  pretest  nor  in  TPIXIE.  In 
addition,  we  plan  to  include  on  the  posttest,  bugs  similar  to  those  on 
TPIXIE  (to  measure  near  transfer),  and  ones  quite  dissimilar  (to  measure 


for  transfer) 


Comments  by  atudents.  The  students  enjoyed  using  TP1X1E.  Typical  com¬ 
ments  Indicated,  1)  "No  teacher  of  diagnosis  was  needed",  2)  "I  had  lit¬ 
tle  difficulty  working  the  program",  and  3)  "1  liked  the  remedial 

option"  (the  one  that  explains  the  common  bug  If  the  user  cannot  discern 
It).  All  but  two  of  the  teacher-trainees  drew  domain-independent  les¬ 
sons  from  interaction  with  TPIXIE,  such  as  the  Importance  of  making  sure 
a  pupil  understands  the  rules  of  mathematics;  the  Importance  of  having 
empathy  for  the  learner  who  finds  mathematics  difficult;  and  the  Impor¬ 
tance  of  knowing  where  a  learner  Is  going  wrong  in  working  tasks.  The 
students  who  did  not  find  TPIX1E  helpful  explained  that  they  were  ele¬ 
mentary  school  teacher-trainees  and  found  the  domain  subj^.t  (algebra) 
unrelated  to  their  own  work. 

A  number  of  suggestions  for  improvement  of  TP1X1E  were  given  by  these 
students,  including: 

1.  The  user  should  be  allowed  to  return  to  the  current  set  of  task- 
answer  pairs  after  having  seen  Just  a  small  number  of  lines  of 
remedial  explanation  (at  present  the  user  is  shown  the  entire  mis- 
worklng  of  the  task).  This  number  should  be  under  control  of  the 
user.  Such  an  option  would  allow  the  user  to  get  "clues"  as  to  the 
pupil's  bug,  which  could  then  be  used  in  a  new  attempt  at  solving 
the  target  task-answer  pairs. 

2.  The  comments  used  to  encourage  the  user  (see  Figure  A)  should  be 
varied,  as  they  may  become  repetitious  over  the  50-minute  session. 

3.  A  variant  of  TP1XIE  should  be  built  for  high-school  students  to 
help  them  diagnose  algebra  errors  with  the  aim  of  improving  their 


performance  at  algebra. 


Future  work.  Possible  changes  to  the  present  system  Include; 

1.  Replacing  the  present  algebra  bugs  with  bugs  that  we  now  know  ( Mar¬ 
ti  nak  et  al,  1987)  are  more  common  among  high  school  students  than  those 
originally  used.  This  change  would  make  the  skills  learned  more 
relevant  to  teachers. 

2.  As  far  as  possible,  TPIXIE  should  be  tested  on  a  sample  of  users 
that  represents  its  target  population;  namely,  trainee  secondary  school 
mathematics  teachers. 

3.  Items  should  be  pilot-tested  to  find  ones  that  are  neither  too  dif¬ 
ficult  nor  too  easy  for  the  population  under  study;  although  Introduc¬ 
tory,  easy  items  should  be  Included  on  the  tests  and  on  TPIXIE  for 
motivational  purposes. 

4.  The  number  of  Items  per  set  on  the  tests  should  be  reduced  to  three 
In  all  cases,  which  would  allow  additional  sets  of  tasks  to  be  worked  In 
the  same  amount  of  time. 

5.  Items  should  be  selected  so  as  not  to  be  capable  of  being  explained 
by  more  than  one  different  common  bug. 

6.  A  TPIXIE  variant  might  be  developed  that  does  not  rely  so  heavily  on 
the  user's  ability  to  abstract  an  error  from  a  set  of  Incorrectly  worked 
tasks.  Such  a  variant  might,  for  example,  first  give  the  user  a  list  of 
known  pupil  bugs.  TPIXIE  would  then  show  task-answer  pairs,  and  ask  the 
user  to  diagnose  the  bugs.  By  comparing  the  results  of  these  two 


versions  of  TP1XIE  we  could  begin  to  learn  how  Important  the  task  of 
having  to  discover  the  bug(s)  Is  for  subsequent  diagnostic  accuracy. 

7.  Finally,  acting  on  the  finding  that  those  in  the  control  group 
improved  from  pre-  to  posttest,  one  might  consider  including  a  pretest** 
posttest-only  condition  to  see  if  gains  similar  to  those  of  the  control 
group  in  this  study  sre  made. 


Table  1 


Composite  Majority  Match  Pre-  and  Poatteat  Mean  Scorea  by  Condition 


Descriptive  statistics 

Paired  t- 

test 

Condition 

Mean 

SD 

t 

df 

P 

TP1XIE 

Pre 

Post 

3.94 

13.38 

3.64 

6.22 

5.89 

15 

.000 

Control 

Pre 

Post 

3.12 

8.65 

3.92 

5.79 

5.05 

16 

.000 

Note:  Maximum  acore  ■  23 
Table  2 

Composite  Full  Match  Pre-  and  Poatteat  Mean  Scores  by  Condition 
Descriptive  statistics  Paired  t-test 

Condition  Mean  SD  t  df  p 


Table  3 


goattcst  Deacrlptlve  Statlatlca  by  Itea  Set  by  Condition 


*  Maximum  score  for  Sets  1,  2,  4  and  6-5 

Maximum  score  for  Set  5  •  3.  Set  3  was  omitted  from  the  analysis 
see  text. 
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Figure  A 


TPIXIE 


What  is  student  #1  doing  wrong  ? 

3  problems  will  be  shown 

The  problem  is  3x  ■  9 

The  student's  answer  was  x  "  1/3 

The  problem  is  4x  -  16 

The  student's  answer  was  x  ■  1/4 

The  problem  is  5x  “  15 

The  student's  answer  was  x  *  1/3 

Mow  solve  the  following  problems  like  the  student. 
Take  as  much  time  as  you  need. 

Solve  (like  the  student)  4x  -  24 
Type  Q  to  QUIT 

x  -  1/6 

That  looks  OK.  You  seem  to  be  on  the  right  track. 
Let's  try  another  one. 

Take  as  much  time  as  you  need. 

Solve  (like  the  student)  5x  ■  10 
Type  Q  to  QUIT 

x  -  1/2 

That  looks  fine.  I  think  you've  got  it! 

Let's  try  one  final  item  to  be  sure. 

Take  as  much  time  as  you  need. 

Solve  (like  the  student)  2x  ■  8 

x  -  1/4 

Bravo ! 

Do  you  wish  to  continue? 


Figure  B 

TPIXIE:  The  remedial  procedure  at  work.  (Abbreviated  format ) 
~  What  is  student  #18  doing  wrong? 

17x  ■  2  *  (4x  +  3) 

x  -  1/3 

8x  -  2  *  (3x  +  5) 
x  -  5/2 

19x  -  2  *  (4x  +  3) 
x  -  3/11 

Solve  (like  the  student)  9x  *  2  *  (3x  +  3) 

Type  Q  to  QUIT 


Now,  let's  take  a  look  at  how  the  student  would  have  solved  the  prob¬ 
lem. 

The  student  would  have  solved  the  problem  like  this: 

9x  -  2  *  (3x  +  3) 

9x  *  6x  +  3 

Did  not  multiply  second  term  in  brackets.  (Hit  C  and  RETURN:  C  ) 

9x  -  6x  -  3 

The  x-term  moved  from  the  right-hand  side  to  the  left-hand  side  and 
the  sign  changed.  (Hit  C  and  RETURN:  C  ) 

3x  «■  3 

Added  or  subtracted  2  x-terms. 

x  -  (Hit  C  and  RETURN:  C) 

Both  sides  divided  by  the  coefficient. 

x  -  1 

Numerator  and  denominator  divided  by  greatest  common  factor. 

Finished. 
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Figure  C 


FPIXIE 


■LT/CwTl’rX.TX.TK 


V' 


This  is  Task-Set  2 

You  will  be  asked  to  work  at  aost  6  tasks  at  this  level. 

Solve  (£  ■  Finished,  q  •  Quit  early) 

3  x  -  -9 
S:  x  -  -  3 
S:  f 


Thank  you. 

You  got  that  one  right! 
Well  done 


Do  you  wish  to  continue?  Please  type  YES  or  NO: 
S:  y 

Solve  (f  •  Finished,  q  ■  Quit  early) 

4  x  ■  -16 
S:  x  -  4 
S:  f 

Thank  you, 

but  you  didn't  get  that  one  right. 

Do  you  wish  to  continue? 

Please  type  YES  or  NO: 

S:  y 
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